Parallelization Techniques for Sparse Matrix Applications1
نویسندگان
چکیده
information to fetch data and to perform the computation. Such runtime preprocessing techniques have been fairly well-studied and successfully incorporated into compilers [14, 6]. The inspector/executor method incurs a runtime overhead for each inspector stage which are greatly increased when references use multiple levels of indirection. To reduce these overheads in sparse-matrix problems, the sparse-array rolling (SAR) technique was proposed in [15]. This method uses special representations for distributed sparse matrices which allows efficient access to the nonzero elements. Initial evaluations of the SAR method were promising and led to its integration into the Vienna Fortran compiler using language extensions. For a detailed description of the compilation issues, we refer the reader to [15]. In this paper, we focus on the SAR runtime support and describe how it allows efficient access to the distributed sparse matrix. The performance of SAR depends on a wide range of different parameters such as the data distribution, the locality characteristics of the algorithm, and the sparsity of the input matrix. We have evaluated its performance under many combinations of these choices, and we describe the impact that each choice has on preprocessing costs, executor efficiency, and memory overhead. The paper is structured as follows. Section 2 provides a general overview of sparse matrix parallelization. Section 3 provides an overview of the SAR technique and its runtime support. Section 4 introduces a sparse-matrix program and illustrates how it is parallelized using the SAR approach. We discuss the impact of distribution choices in Section 5. Section 6 provides a detailed performance evaluation. The last two sections discuss related work and the conclusions drawn from this work.
منابع مشابه
Web-Site-Based Partitioning Techniques for Efficient Parallelization of the PageRank Computation
The efficiency of the PageRank computation is important since the constantly evolving nature of the Web requires this computation to be repeated many times. PageRank computation includes repeated iterative sparse matrix-vector multiplications. Due to the enourmous size of the Web matrix to be multiplied, PageRank computations are usually carried out on parallel systems. Graph and hypergraph par...
متن کاملDomain Decomposition Based High Performance Parallel Computing
The study deals with the parallelization of finite element based Navier-Stokes codes using domain decomposition and state-ofart sparse direct solvers. There has been significant improvement in the performance of sparse direct solvers. Parallel sparse direct solvers are not found to exhibit good scalability. Hence, the parallelization of sparse direct solvers is done using domain decomposition t...
متن کاملApplicability of Program Comprehension to Sparse Matrix Computations
Space{eecient data structures for sparse matrices typically yield programs in which not all data dependencies can be determined at compile time. Automatic pa-rallelization of such codes is usually done at run time, e.g. by applying the inspector{ executor technique, incurring tremendous overhead. | Program comprehension techniques have been shown to improve automatic parallelization of dense ma...
متن کاملParallelization Techniques for Sparse Matrix Applications
Sparse matrix problems are diicult to parallelize eeciently on distributed memory machines since data is often accessed indirectly. Inspector/executor strategies, which are typically used to parallelize loops with indirect references, incur substantial run-time preprocessing overheads when references with multiple levels of indirection are encountered | a frequent occurrence in sparse matrix al...
متن کاملPreconditioning Techniques for Large LinearSystems: A Survey
This article surveys preconditioning techniques for the iterative solution of large linear systems, with a focus on algebraic methods suitable for general sparse matrices. Covered topics include progress in incomplete factorization methods, sparse approximate inverses, reorderings, parallelization issues, and block and multilevel extensions. Some of the challenges ahead are also discussed. An e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996